PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
نویسندگان
چکیده
In this paper we present PaCCSS–IT, a Parallel Corpus of Complex–Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex–simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less–resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.
منابع مشابه
Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification
This study addresses the automatic simplification of texts in Spanish in order to make them more accessible to people with cognitive disabilities. A corpus analysis of original and manually simplified news articles was undertaken in order to identify and quantify relevant operations to be implemented in a text simplification system. The articles were further compared at sentence and text level ...
متن کاملAutomatic Simplification of Spanish Text for e-Accessibility
In this pa per we present an automatic text simplification system for Spanish which intends to make texts more accessible for users with cognitive disabilities. This system aims at reducing the structural complexity of Spanish sentences in that it converts complex sentences in two or more simple sentences and therefore reduces reading difficulty.
متن کاملA Tagging Approach to Identify Complex Constituents for Text Simplification
The occurrence of syntactic phenomena such as coordination and subordination is characteristic of long, complex sentences. Text simplification systems need to detect and categorise constituents in order to generate simpler sentences. These constituents are typically bounded or linked by signs of syntactic complexity, which include conjunctions, complementisers, whwords, and punctuation marks. T...
متن کاملAn Open Corpus of Everyday Documents for Simplification Tasks
In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with t...
متن کاملLearning When to Simplify Sentences for Natural Text Simplification
This paper introduces a corpus-based approach for selecting sentences that require simplification in the context of Brazilian Portuguese text simplification system. Based on a parallel corpus of original and simplified text versions, we apply a binary classifier to decide in which circumstances a sentence should or not be split – which is the most important syntactic simplification operation – ...
متن کامل